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Abstract 

We prove that the Poisson distribution maximises entropy in the class of ultra 
log-concave distributions, extending a result of Harremoes. The proof uses ideas 
concerning log-concavity, and a semigroup action involving adding Poisson vari- 
ables and thinning. We go on to show that the entropy is a concave function along 
this semigroup. 



1 Maximum entropy distributions 

It is well-known that the distributions which maximise entropy under certain very nat- 
ural conditions take a simple form. For example, among random variables with fixed 
mean and variance the entropy is maximised by the normal distribution. Similarly, for 
random variables with positive support and fixed mean, the entropy is maximised by 
the exponential distribution. The standard technique for proving such results uses the 
Gibbs inequality, and establishes the fact that, given a function R{-) and fixing KR{X), 
the maximum entropy density is of the form a exp(— for constants a and (3. 



Example 1.1 Fix mean fi and variance and write (p^^a-^ for the density of Z^^„2 ~ 
A^(/i, cr^). For random variable Y with density py write A(F) = — J pviy) ^og(f)f^ a-^{y)dy. 
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Then for any random variable X with mean ^, variance a and density px, 

KX) = - j px{x)\og<P,,A^)dx = j px{x)(^^^^^^^+^^^^^^dx 

4>f,,a^{x) \0g(f)^^„2{x)dx = A(Z^,^2). (1) 



This means that, for any random variable X with mean fi and variance a"^, the entropy H 
satisfies H{X) < H{Z^„2), since Equation (OJ) gives that A(X) = K{Z^^„2) = H{Z^„2), 

- H{X) + H{Zf,^A = J px{x) log px{x)dx - j px{x) log (j) ^^^2 {x)dx. (2) 

This expression is the relative entropy D{X\\Z^^A , and is positive by the Gibbs inequality 
(see Equation / fl^) below), with equality holding if and only if px = 0^i,o-2- 

This maximum entropy result can be regarded as the first stage in understanding the 
Central Limit Theorem as a result concerning maximum entropy. Note that both the 
class of variables with mean /x and variance (over which the entropy is maximised) 
and the maximum entropy variables Z^^^2 are well-behaved on convolution. Further, the 
normalized sum of IID copies of any random variable X in this class converges in total 
variation to the maximum entropy distribution Z^^„2 . The main theorem of Barron |2] 
extends this to prove convergence in relative entropy, assuming that H{X) > — cxd. 

However, for functions R where KR{X) is not so well-behaved on convolution, the sit- 
uation is more complicated. Examples of such random variables, for which we would 
hope to prove limit laws of a similar kind, include the Poisson and Cauchy families. In 
particular, we would like to understand the "Law of Small Numbers" convergence to the 
Poisson distribution as a maximum entropy result. Harremoes proved in fl''^ that the 
Poisson random variables Zx (with mass function Ilx{x) = e~^X^/x\ and mean A) do 
satisfy a natural maximum entropy property. 

Definition 1.2 For each A > and n > 1 define the classes 
Bn{\) = Is : ES* = A, 5 = ^^Xj, where Xi are independent Bernoulli variables >, 

and 5oo(A) = U„K(A). 

Theorem 1.3 ([7J, Theorem 8) For each A > 0, the entropy of any random variable 
in class i?oo(A) is less than or equal to the entropy of a Poisson random variable Z\: 

sup H{S) = H{Zx). 
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Note that Shepp and Olkin and Mateev ^Sl also showed that the maximum entropy 
distribution in the class Bn{X) is Binomial(n, A/n). 

In this paper, we show how this maximum entropy property relates to the property 
of log- concavity, and give an alternative proof, which shows that Zx is the maximum 
entropy random variable in a larger class ULC(A). 

2 Log-concavity and main theorem 

First, recall the following definition: 

Definition 2.1 A non-negative sequence {u{i),i > 0) is log-concave if, for alii > 1, 



We say that a random variable V taking values in Z+ is log-concave if its probabihty 
mass function Pv{i) = '^iy = i) forms a log-concave sequence. Any random variable 
S G -Boo is log-concave, which is a corollary of the following theorem (see for example 
Theorem 1.2 on P. 394 of [12!). 

Theorem 2.2 The convolution of any two log-concave sequences is log-concave. 

Among random variables, the extreme cases of log-concavity are given by the geometric 
family - that is, geometric probability mass functions are the only ones which achieve 
equality in Equation (jHl) for all i. The argument of Example 11.11 shows that discrete 
entropy is maximised under a mean constraint by the geometric distribution. Hence, in 
the class of log-concave random variables with a given mean, the geometric is both the 
extreme and the maximum entropy distribution. 

Unfortunately, the sum of two geometric distributions is a negative binomial distribu- 
tion, which has a mass function which is log-concave but no longer achieves equality in 
(121). This means that under the condition of log-concavity the extreme cases and the 
maximum entropy family are not well-behaved under convolution. This suggests that 
log-concavity alone is too weak a condition to motivate an entropy-theoretic understand- 
ing of the Law of Small Numbers. 

A more restrictive condition than log-concavity is ultra log-concavity, defined as follows: 




(3) 
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Definition 2.3 A non-negative sequence {u{i),i > 0) is ultra log-concave if the sequence 
{u{i)i\,i > 0) is log-concave. That is, for all i >l, 

iu{i)^ >{i + l)u{i + l)u{i-l). (4) 

Note that in Pemantle |17^ , Liggett , and Wang and Yeh [22! > this property is referred 
to as 'ultra log-concavity of order oo' - see Equation ((7j) below for the definition of ultra 
log-concavity of order n. 

An equivalent characterization of ultra log-concavity is that for any A, the sequence 
of ratios {u{i) /Ilx{i)) is log-concave. This makes it clear that among probability mass 
functions the extreme cases of ultra log- concavity, in the sense of equality holding in 
Equation for each i, are exactly the Poisson family, which is preserved on convolution. 

Definition 2.4 For any A > 0, define ULC(A) to be the class of random variables V 
with mean KV = A such that probability mass function Py is ultra log-concave, that is 

iPv{if >{i + l)Pv{i + l)Pv{i - 1), for all i > 1. (5) 

An equivalent characterization of the class ULC(A) is that the scaled score function 
introduced in jii)/ is decreasing, that is 

Pv{i) = ^ ' }r,^!^ '\ ~^ — ^ — 1 is a decreasing function in i. (6) 

In Sectioniniwe discuss properties of the class ULC(A). For example. Lemma EUl shows 
that (as for Harremoes's i?oo(A)) the ULC(A) are well-behaved on convolution, and that 
B^{X) c ULC(A), with Zx G ULC(A). 

The main theorem of this paper is as follows: 

Theorem 2.5 For any A > 0, if X E ULC(A) then the entropy of X satisfies 

H{X) < H{Zx), 

with equality if and only if X Z\. 

We argue that this result gives the discrete analogue of the maximum entropy property 
of the normal distribution described in Example 11.11 since both the class ULC(A) and 
the family Zx of maximum entropy random variables are preserved on convolution, and 
since ULC(A) has another desirable property, that of "accumulation". That is, suppose 
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we fix A and take a triangular array of random variables {X^ }, where for i = 1, . . . ,n 
the Xf"^ are IID and in ULC(A/n). The techniques of [11^ can be extended to show 
that as n — > oo the sum x[^^ + . . . + X^^ converges to Zx in total variation (and indeed 
in relative entropy). 

It is natural to wonder whether Theorem 12.51 is optimal, or whether for each A there 
exists a strictly larger class C(A) such that (i) the C(A) are well-behaved on convolution 
(ii) Zx is the maximum entropy random variable in each C(A) (iii) accumulation holds. 
We do not offer a complete answer to this question though, as discussed above, the class 
of log-concave variables is too large and fails both conditions (i) and (ii). 

For larger classes C(A), again consider a triangular array where {xf"^} e C{X/n). Write 
Pn = P(X["^ > 0) and Qn for the conditional distribution Qn{x) = P(xf"^ = x|X.^"^ > 0). 
If the classes C(A) are large enough that we can find a subsequence (n^) such that 
Quk ~^ Q ^-iid EiQnf. — > EQ, then the sum xj"^ + . . . + X^'' converges to a compound 
Poisson distribution CP{\/EQ,Q). Thus, if C(A) are large enough that we can find a 
limit Q ^ 6i then the limit is not equal to Zx and so the property of accumulation fails. 
(Note that for X e ULC(A) the P(X > 2|X > 0) < (exp(A) - A - 1)/A, so the only 
limiting conditional distribution is indeed Si). 

The proof of Theorem 12.51 is given in Sections El and |3J and is based on a family of maps 
(Uq,) which we introduce in Definition l4.1l below. This map mimics the role played by the 
Ornstein-Uhlenbeck semigroup in the normal case. In the normal case, differentiating 
along this semigroup shows that the probability densities satisfy a partial differential 
equation, the heat equation, and hence that the derivative of relative entropy is the 
Fisher information (a fact referred to as the de Bruijn identity - see 0). This property 
is used by Stam [20] and Blachman jH] to prove the Entropy Power Inequality, which 
gives a sharp bound on the behaviour of continuous entropy on convolution. It is possible 
that a version of Uq, may give a similar result for discrete entropy. 

As a varies between 1 and 0, the map Uq interpolates between a given random variable 
X and a Poisson random variable with the same mean. By establishing monotonicity 
properties with respect to a, the maximum entropy result. Theorem 12.51 follows. The 
action of \Ja is to thin X and then to add an independent Poisson random variable 
to it. In Section we use Uq, to establish the maximum entropy property of the 
Poisson distribution. The key expression is Equation (jHJ, which shows that the resulting 
probabilities satisfy an analogue of the heat equation. 

We abuse terminology slightly in referring to U^ as a semigroup; in fact (see Equation 
(fT^ below) Uai oUaa = Uq^qj, SO we would require a reparametrization We = Uexp(-6i) 
reminiscent of Bakry and Emery PP to obtain the more familiar relation that W^^ o = 
+6I2. However, in Sectional we argue that U^ has the 'right' parametrization, by 
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proving Theorem 15.11 which shows that H(\JaX) is not only monotonically decreasing 
in a, but is indeed a concave function of a. We prove this by writing H{\JaX) = 
A(Uq,X) — D(\JaX\\Zx), and differentiating both terms. 

In contrast to conventions in Information Theory, throughout the paper entropy is de- 
fined using logarithms to base e. However, scaling by a factor of log 2 restores the 
standard definitions. 

3 Properties of ULC(A) and definitions of maps 

In this section, we first note some results concerning properties of the classes ULC(A), 
before defining actions of addition and thinning that will be used to prove the main 
results of the paper. 

Lemma 3.1 For any A > and ^ > 0: 

1. IfV(z ULC(A) then it is log-concave. 

2. The Poisson random variable Zx G ULC(A). 

3. The classes are closed on convolution: that is for independent U G ULC(A) and 
V e ULC(/i), the sumU + V e ULC(A + /i). 

I 5oo(A) C ULC(A). 

Proof Parts 1. and 2. follow from the definitions. Theorem 1 of Walkup |22J implies 
that Part 3. holds, though a more direct proof is given by Theorem 2 of Liggett jH]. Part 
4. follows from Part 3., since any Bernoulli(p) mass function scaled by Up is supported 
only on 2 points, so belongs to ULC(p). □ 

We can give an alternative proof of Part 3 of Lemma [3. H using ideas of negative associ- 
ation developed by Efron [H] and by Joag-Dev and Proschan [H]. The key result is that 
if U and V are log-concave random variables, then for any decreasing function 

K[(f){U, V)\U + V = w] is a decreasing function of w. 

Now, the Lemma on P.471 of [11^ shows that, writing a = E,U/{KU + KV) and using 
the score function of Equation ©, for independent U and V: 

Pu+vH = HapuiU) + (1 - a)pv{V)\U + V = w], 

so that if pu and pv are decreasing, then so is pu+v- 
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Remark 3.2 For each n, the Poisson mass function Ux is not supported on [0,n] and 
hence Zx ^ Bn{X), so that Zx ^ -Boo(A). Indeed, we can see that the class of ultra log- 
concave random variables is non-trivially larger than the class of Bernoulli sums. For 
all random variables V G Bn{X), the Newton inequalities (see for example Theorem 1.1 
of Niculescu ]1(^ ) imply that the scaled mass function Py(i)/(") is log-concave, so that 
for all i > 1 : 

iPviif ^ n-i + l 



{i + l)Pv{i + l)Pv{i + l) ~ n-i 

This is the property referred to by Pemantle \17l and Liggett [i^f as "ultra log- concavity 
of order n", and is strictly more restrictive than simply ultra log-concavity which (see 
Equation ^) only requires a lower bound of 1 on the right-hand side. 

Next we introduce the maps S/j and that will be key to our results. 
Definition 3.3 Define the maps S/j and which act as follows: 

1. For any P > 0, define the map that maps random variable X to random variable 

SjsX ~ X + Zjs, 
where Zf^ is a Poisson{(3) random variable independent of X. 

2. For any < a < 1, define the map that maps random variable X to random 
variable 

X 



i=l 



where Bi{a) are Bernoulli (a) random variables, independent of each other and of 
X . This is the thinning operation introduced by Renyi 118] . 

We now show how these maps interact: 

Lemma 3.4 For any < a,ai,a2 < 1 o,nd for any /3, /3i,/32 > 0, the maps defined in 
Definition \S. ,91 satisfy: 

1. O = S;32 ° = S/3i+/32. 

3. Tq, o S/3 = Sq,/j o Tq. 
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Proof Part 1. follows immediately from the definition. To prove Part 2, we write 
Bi{aia2) = Bi{ai)Bi{a2) where -Bj(ai) and Bi{a2) are independent, then for any X 

i=l i:B,{ai)=l,i<X i=l 

Part 3 uses the fact that the sum of a Poisson number of IID Bernoulli random variables 
is itself Poisson. This means that for any X 

i=i \i=i / yi=x+i J 

as required. □ 

Definition 3.5 Define the two-parameter family of maps 

ya,p = SpoT^, forO<a<l, P>0. 

As in Stam PU] and Blachman j^, we will differentiate along this family of maps, and 
see that the resulting probabilities satisfy a partial differential-difference equation. 

Proposition 3.6 Given X with mean X, writing Pa{z) = P(Vq,j(q,)X = z), then 

■^Paiz) = 9{a){Pa{z) - Pa{z - 1)) - -((^ + 1)P„(^ + 1) " zP^{z)), (8) 

where g (a) = f{a)/a — f'{a). Equivalently, f{a) = af{l) + a g{l3)/ iSd/S. 
Proof We consider probability generating functions (pgfs). Notice that 
P(T„X = ^) = gp(X = x) - 



so that if X has pgf Gx{t) = = then T„X has pgf E.^'Ex>zP(^ = 

x)Oa^il - af-^ = = x) E:=o - af-' = Gx{ta + 1 - a). 

If Y has pgf Gy(t) then SpY has pgf Gyit) exp(/3(t — 1)). Overall then, Yaj(a)X has 
pgf 

G^it) = Gxita + (1 - a)) exp(/(a)(t - 1)), (9) 
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which satisfies 



■^G^it) = (1 - t) (^^G^W - G^it)g{a: 



and comparing coefficients the resuh follows. □ 

We now prove that both maps S/j and Tq, preserve ultra log-concavity. 

Proposition 3.7 If X is an ultra log-concave random variable then for any a G [0, 1] 

and /3 > random variables SpX and T^X are both ultra log-concave, and hence so is 

Proof The first result follows by Part 3. of Lemma IHTTl We prove the second result using 
the case /(a) = of Proposition I3.fi[ which tells us that writing Pa{x) = F(TaX = x), 
the derivative 

^Paix) = - (xP^ix) - (X + 1)P„(X + 1)) . (10) 

Writing ga{z) = zPo,{zY — {z + \)Pa{z + l)Pa{z — 1), Equation (fTn|l gives that for each 

z, 

^g^{z) = 2z^-^ + '-^[{z + 2)P^{z + 2)P^{z-l)-zP^{z)P^{z + l)) 

g, {z^2)P^{z^2) \ g^{z) zP^jz) 

P.iz+l) J a aP^iz + lf"-^'^''- 

We know that Pa is ultra log-concave for a = 1, and will show that this holds for smaller 
values of a. Suppose that for some a, Pa is ultra log-concave, so for each z, ga{z) > 0. 
If for some z, ga{z) = then since ga{z -\- 1) > 0, Equation (fTT|) simplifies to give 
■§^ga{z) < 0. This means (by continuity) that there is no value of z for which ga{z) can 
become negative as a gets smaller, so ultra log-concavity is preserved. □ 



4 Maximum entropy result for the Poisson 

We now prove the maximum entropy property of the Poisson distribution within the class 
ULC(A). We choose a one-parameter family of maps (Uq,), which have the property 
that they preserve the mean A. 

Definition 4.1 Given mean A > and < a < 1, define the combined map 
Equivalently Uq = Sx(i-a) ° or U„ = o SA(i/a_i). 
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Note that the maps Uq, have a semigroup-hke structure - by Lemma f3. 41 we know that 

(SA(l-ai)oTQ,Jo(SA(l-a2)oTQ2) = (8^(1-01) O Sao^ (l-aj) ) ° (Taj oTq^ ) = ^ X{l-aia2)°T^ 0102- 

That is, we know that 

Uq, O = Va^a2- (12) 

Equation (jHJ can be simphfied with the introduction of some helpful notation. Define 
A and its adjoint A* by Ap{x) = p{x + 1) — p{x) and A*q{x) = q{x — 1) — qix). These 
maps A and A* are indeed adjoint since for any functions p, q: 

(Ap(x)) q{x) = = = ^p(x) (A*g(x)) 

XX X X 

(13) 

We write Pa{z) for pu^xiz) = {z + l)Pa{z + l)/XPa{z) — 1. Then, noting that {z + 
l)P^iz + 1)/A - P^iz) = Paiz)p^{z) = Uxiz) (P^iz + 1)/Uxiz + 1) - P^{z)/Uxiz)), we 
can give two alternative reformulations of Equation (jHl) in the case where V^ajia) = Uq,. 



Corollary 4.2 WnUng Pc,{z) = P(U„X = z): 

^Pa{z) = -A\P^{z)p^{z)). (14) 
oa a 

Secondly, in a form more reminiscent of the heat equation: 

^PJz) = ^A* (u,(z)A^^"^'^ 



da a \ ^11^(2;) 

Note that we can also view Uq, as the action of the M/M/00 queue. In particular 
Equation (jH)), representing the evolution of probabilities under Uq,, is the adjoint of 

Lfiz) = -\AA*fiz) + iz- A)A*/(^), 

representing the evolution of functions. This equation is the polarised form of the in- 
finitesimal generator of the M/M/cx) queue, as described in Section 1.1 of Chafai 
Chafai uses this equation to prove a number of inequalities concerning generalized en- 
tropy functionals. 

Proof of Theorem 12.51 Given random variable X with mass function Px, we define 
A(X) = — Px{x) log Ha (x). Notice that (as remarked by Tops0e |2I]), the conditions 
required in Example 11.11 can be weakened. If A(X) < A{Zx) = H{Zx) then adapting 
Equation gives that -H{X) + H{Zx) > -H{X) + A(X) = D{X\\Zx) > 0, and we 
can deduce the maximum entropy property. 
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We will in fact show that if X G ULC(A) then A(UaX) is an decreasing function of 
a. In particular, since \JqX ~ Z\, and UiX ~ X, we deduce that A(X) < K{Zx). (A 
similar technique of controlling the sign of the derivative is used by Blachman and 
Stam 120] to prove the Entropy Power Inequality). 

We simply differentiate and use Equations (fT^ and (fT^ . Note that 
^A(U„X) = -^5^A*(P„(^)p,(z))lognA(z) 

z 

= --y^Pa{z)Pa{z)^{\0gTix{z)) 

z 

z ^ ^ 

By assumption X G ULC(A), so by Proposition 13.71 IJ^X G ULC(A), which is equiv- 
alent to saying that the score function Pa{z) is decreasing in z. Further, note that 
J2z Pa{z)pa{z) = 0. Since log((2 + 1)/A) is increasing in z (a fact which is equivalent to 
saying that the Poisson mass function Ilx{z) is itself log-concave), ^A(Uq,X) is nega- 
tive by Chebyshev's rearrangement lemma, since it is the covariance of a decreasing and 
increasing function. 

In fact, A(Uq,X) is strictly decreasing in a, unless X is Poisson. This follows since 
equality holds in Equation (|T5|l if and only if Pa{z) = 0, which characterizes the Poisson 
distribution. □ 



5 Concavity of entropy along the semigroup 

In fact, rather than just showing that the Poisson distribution has a maximum entropy 
property, in this section we establish a stronger result, as follows. 

Theorem 5.1 //X G ULC(A), then the entropy ofUaX is a decreasing and concave 
function of a, that is 

—H{V^X) < and ^H{V^X) < 0, 
with equality if and only z/X ~ IIa. 

Proof The proof is contained in the remainder of this section, and involves writing 
if (U„X) = A(UaX) - D{lJaX\\Zx), and differentiating both terms. 
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We have already shown in Equation ()15|) that A{\JaX) is decreasing in a. We show in 
Lemma t hat it is concave in a, and in Lemmas 15 . 21 and 15 . 51 respect ivelv we show that 
D{\JaX\\Z\) is increasing and convex. Some of the proofs of these lemmas are merely 
sketched, since they involve long algebraic manipulations using Equation ()14|1 . □ 

In the case of continuous random variables, Costa ^ uses the concavity of the entropy 
power on addition of an independent normal variable (a stronger result than concavity of 
entropy itself) to prove a version of the Entropy Power Inequality. We regard Theorem 
15. li as the first stage in a similar proof of a discrete form of the Entropy Power Inequality. 



Lemma 5.2 For X with mean X, D(\JaX\\Zx) is an increasing function of a. 
Proof We use Equation (fT^ . Note that (omitting arguments for the sake of brevity): 



^)^P.log {^]=l^^^ogi- +l^^ = l^^\og i- 



This means that 



d_ 
da 



= -E^"(-KWi-g(p^(,)n,(. + i) 

= -y2P^{z)p^{z)\og{l + p^{z)). (16) 

z 

Now, as in fTT], we write Pa{z) = {z + l)Pa{z + 1)/A. Pa is often referred to as the 
size-biased version of Pq,, and is a probability mass function because Uq fixes the mean. 
Notice that pa{z) = Pa{z) / Pa{z) — 1, so that we can rewrite Equation (fT^ as 

^J2iPc.{z)-Pa{z)) log (^^^ = ^[D{Pa\\Pa) + D{PjPa))>0. (17) 

This quantity is a symmetrised version of the relative entropy, and was originally intro- 
duced by KuUback and Leibler in [THj. □ 

Lemma 5.3 Using the definitions above, if X & ULC(A) then A(Uq,X) is a concave 
function of a. It is strictly concave unless X is Poisson. 
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Sketch Proof Using Equations ()14|) and (fT3j) . it can be shown that 

|,A(U„X) ^ ^^P,(.Kw(i,o,(i±i)-,o,(i±|)). 

Now, the resuh follows in the same way as before, since for any A the function zjX log((z+ 
\)lz) - log((z + 2)/(z + 1)) is increasing, so ^A(U„X) > 0. □ 

Taking a further derivative of Equation ()17p . we can show that (the proof is omitted for 
the sake of brevity) : 

Lemma 5.4 The relative entropy D(\JaX\\Zx) satisfies 

where P^{z) = {z + l)Pa{z + 1)/A and P^{z) = {z + 2){z + l)Pa{z + 2)/X^. 

Lemma 5.5 For X with mean A and Var X < A, D{\J aX\\Z\) is a convex function of 
a. It is a strictly convex function unless X is Poisson. 

Proof Notice that the map T^, scales the rth falling moment of X by . This means 
that Var T„X = a^Yax X + a{l - a)A, so that Var V^X = a^V&i X + A(l - a^). 
Hence, the condition Var X < A implies that for all a, Var U^X < A. Equivalently, 

S := Pa{z) = E(U„X)(U„X - 1)/A2 < 1, 

We will use the log-sum inequality, which is equivalent to the Gibbs inequality, and 
states that for positive sequences (oj) and (not necessarily summing to 1), 




Since logw < u—1, this simplifies further to give D{ai\\bi) > (^^ Oj) (log(^j a,) + 1 — 6,). 
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We express the first term of Lemma 15.41 as a sum of relative entropies, and recall that 
J2z Pa{z) = 1 and Pa{z) = I5 simplifying the bounds on the second and third terms: 



D P„ 



p2 

a 



2D P„ 



P P 



Pr, 



D P„ 



p2 
P 



> -,[S\ogS + S-S^ ^ % 



'z + l)P^{z+l)P^{z-l] 
XPJz) 



^^_J2{_Z-l)Pa{z-ir 



zPaiz) 

Using Equation (jH)) we can expand the second (Fisher) term of Lemma 15.41 as 



(19) 



A 



a 



' ' , E(U„.Y)^ ^ y. 



A2 

PJz) 



'z+iyP^{z + lf 

X^Paiz) 



[z + l)P^{z + l)P^iz-l] 
XPJz) 



(20) 



Adding Equations ^ and ((201), and since S = E{\JaXf/X^ - 1/A, we deduce that 

, X^f^, . ^.^(z + lYPjz + 1? ^PJz~lY 1 



da' 



;2Pa(^) A 



Finally we exploit Cramer- Rao type relations which bound the two remaining quadratic 
terms from below. Firstly, as in 



[z + lfPjz + l] 

X'Paiz) 



1. (22) 



Similarly, a weighted version of the Fisher information term of Johnstone and MacGib- 
bon [10] gives that: 



^ ^Pa(^) A- 



(23) 



(Note that in Equations and (j2SI), equality holds if and only if Pa = Ha). Substi- 
tuting Equations and in Equation (jTTjl . we deduce that 

D{VaX\\Zx) >^iS\ogS+l-S)>0, 



da 



a^ 



14 



with equality if and only if 5 = 1. 



□ 



Combining these lemmas, the proof of Theorem 15 . II is complete, since ultra log-concavity 
of X implies that Var X < EX, as Pxix)x{{x + l)Pxix + 1)/Pxix) - A) < since 
it is again the covariance of an increasing and decreasing function. 
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